Xiaomi Open Sources Its First Native End-to-End Speech Large Model Xiaomi-MiMo-Audio
On September 19, Xiaomi announced the open source of its first native end-to-end speech large model, Xiaomi-MiMo-Audio, which marks a major breakthrough in the field of speech technology. Five years ago, the emergence of GPT-3 opened a new era for general artificial intelligence (AGI) in language, but the speech field has long been constrained by reliance on large-scale annotated data, making it difficult to achieve similar few-shot generalization capabilities as language models. Now, Xiaomi's Xiaomi-MiMo-Audio model is based on innovative pre-training